Correcting for Optimistic Prediction in Small Data Sets

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correcting for Optimistic Prediction in Small Data Sets

The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. W...

متن کامل

Correcting MM estimates for "fat" data sets

Regression MM estimates require the estimation of the error scale, and the determination of a constant that controls the e¢ ciency. These two steps are based on asymptotic results that are derived assuming that the number of predictors p remains …xed while the number of observations n tends to in…nity, which means assuming that the ratio p=n is “small”. However, many high-dimensional data sets ...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Kneser-Ney Smoothing With a Correcting Transformation for Small Data Sets

We present a technique which improves the Kneser–Ney smoothing algorithm on small data sets for bigrams, and we develop a numerical algorithm which computes the parameters for the heuristic formula with a correction. We give motivation for the formula with correction on a simple example. Using the same example, we show the possible difficulties one may run into with the numerical algorithm. App...

متن کامل

Link Prediction in Highly Fractional Data Sets

Extremist organizations all over the world increasingly use online social networks as a communication media for recruitment and planning. As such, online social networks are also a source of information utilized by intelligence and counter terror organizations investigating the relationships between suspected individuals. Unfortunately, the data mined from open sources is usually far from being...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: American Journal of Epidemiology

سال: 2014

ISSN: 0002-9262,1476-6256

DOI: 10.1093/aje/kwu140